222 research outputs found
Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results
As part of the 2016 public evaluation challenge on Detection and
Classification of Acoustic Scenes and Events (DCASE 2016), the second task
focused on evaluating sound event detection systems using synthetic mixtures of
office sounds. This task, which follows the `Event Detection - Office
Synthetic' task of DCASE 2013, studies the behaviour of tested algorithms when
facing controlled levels of audio complexity with respect to background noise
and polyphony/density, with the added benefit of a very accurate ground truth.
This paper presents the task formulation, evaluation metrics, submitted
systems, and provides a statistical analysis of the results achieved, with
respect to various aspects of the evaluation dataset
The bag-of-frames approach: a not so sufficient model for urban soundscapes
The "bag-of-frames" approach (BOF), which encodes audio signals as the
long-term statistical distribution of short-term spectral features, is commonly
regarded as an effective and sufficient way to represent environmental sound
recordings (soundscapes) since its introduction in an influential 2007 article.
The present paper describes a concep-tual replication of this seminal article
using several new soundscape datasets, with results strongly questioning the
adequacy of the BOF approach for the task. We show that the good accuracy
originally re-ported with BOF likely result from a particularly thankful
dataset with low within-class variability, and that for more realistic
datasets, BOF in fact does not perform significantly better than a mere
one-point av-erage of the signal's features. Soundscape modeling, therefore,
may not be the closed case it was once thought to be. Progress, we ar-gue,
could lie in reconsidering the problem of considering individual acoustical
events within each soundscape
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
Large-scale feature selection with Gaussian mixture models for the classification of high dimensional remote sensing images
A large-scale feature selection wrapper is discussed for the classification of high dimensional remote sensing. An efficient implementation is proposed based on intrinsic properties of Gaussian mixtures models and block matrix. The criterion function is split into two parts:one that is updated to test each feature and one that needs to be updated only once per feature selection. This split saved a lot of computation for each test. The algorithm is implemented in C++ and integrated into the Orfeo Toolbox. It has been compared to other classification algorithms on two high dimension remote sensing images. Results show that the approach provides good classification accuracies with low computation time
GMM-based classification from noisy features
International audienceWe consider Gaussian mixture model (GMM)-based classification from noisy features, where the uncertainty over each feature is represented by a Gaussian distribution. For that purpose, we first propose a new GMM training and decoding criterion called log-likelihood integration which, as opposed to the conventional likelihood integration criterion, does not rely on any assumption regarding the distribution of the data. Secondly, we introduce two new Expectation Maximization (EM) algorithms for the two criteria, that allow to learn GMMs directly from noisy features. We then evaluate and compare the behaviors of two proposed algorithms with a categorization task on artificial data and speech data with additive artificial noise, assuming the uncertainty parameters are known. Experiments demonstrate the superiority of the likelihood integration criterion with the newly proposed EM learning in all tested configurations, thus giving rise to a new family of learning approaches that are insensitive to the heterogeneity of the noise characteristics between testing and training data
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
Extended playing techniques: The next milestone in musical instrument recognition
The expressive variability in producing a musical note conveys information
essential to the modeling of orchestration and style. As such, it plays a
crucial role in computer-assisted browsing of massive digital music corpora.
Yet, although the automatic recognition of a musical instrument from the
recording of a single "ordinary" note is considered a solved problem, automatic
identification of instrumental playing technique (IPT) remains largely
underdeveloped. We benchmark machine listening systems for query-by-example
browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets
of instrument, mute, and technique. We identify and discuss three necessary
conditions for significantly outperforming the traditional mel-frequency
cepstral coefficient (MFCC) baseline: the addition of second-order scattering
coefficients to account for amplitude modulation, the incorporation of
long-range temporal dependencies, and metric learning using large-margin
nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the
Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7% for
instrument recognition (baseline at 89.0%) and of 61.0% for IPT recognition
(baseline at 44.5%). We interpret this gain through a qualitative assessment of
practical usability and visualization using nonlinear dimensionality reduction.Comment: 10 pages, 9 figures. The source code to reproduce the experiments of
this paper is made available at:
https://www.github.com/mathieulagrange/dlfm201
- …